<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" xmlns:m="http://www.w3.org/1998/Math/MathML" xmlns:pls="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:ssml="http://www.w3.org/2001/10/synthesis" xmlns:svg="http://www.w3.org/2000/svg">
  <head>
    <title>Characters</title>
    <link rel="stylesheet" type="text/css" href="docbook-epub.css"/>
    <link rel="stylesheet" type="text/css" href="kawa.css"/>
    <script src="kawa-ebook.js" type="text/javascript"/>
    <meta name="generator" content="DocBook XSL-NS Stylesheets V1.79.1"/>
    <link rel="prev" href="Characters-and-text.xhtml" title="Characters and text"/>
    <link rel="next" href="Character-sets.xhtml" title="Character sets"/>
  </head>
  <body>
    <header/>
    <section class="sect1" title="Characters" epub:type="subchapter" id="Characters">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both">Characters</h2>
          </div>
        </div>
      </div>
      <p>Characters are objects that represent human-readable characters
such as letters and digits.  More precisely, a character
represents a <a class="ulink" href="http://www.unicode.org/glossary/#unicode_scalar_value" target="_top">Unicode scalar value</a>. Each character has an integer value
in the range <code class="literal">0</code> to <code class="literal">#x10FFFF</code>
(excluding the range <code class="literal">#xD800</code> to <code class="literal">#xDFFF</code>
used for <a class="ulink" href="http://www.unicode.org/glossary/#surrogate_code_point" target="_top">Surrogate Code Points</a>).
</p>
      <div class="blockquote">
        <blockquote class="blockquote">
          <p><span class="emphasis"><em>Note:</em></span>
Unicode distinguishes
between glyphs, which are printed for humans to read, and characters,
which are abstract entities that map to glyphs (sometimes in a way
that’s sensitive to surrounding characters). Furthermore, different
sequences of scalar values sometimes correspond to the same
character. The relationships among scalar, characters, and glyphs are
subtle and complex.
</p>
          <p>Despite this complexity, most things that a literate human would call
a “character” can be represented by a single Unicode scalar value
(although several sequences of Unicode scalar values may represent
that same character). For example, Roman letters, Cyrillic letters,
Hebrew consonants, and most Chinese characters fall into this
category.
</p>
          <p>Unicode scalar values exclude the range <code class="literal">#xD800</code> to <code class="literal">#xDFFF</code>,
which are part of the range of Unicode <em class="firstterm">code points</em>.
However, the Unicode code points in this range, the so-called
<em class="firstterm">surrogates</em>, are an artifact of the UTF-16 encoding, and can only
appear in specific Unicode encodings, and even then only in pairs that
encode scalar values.  Consequently, all characters represent code
points, but the surrogate code points do not have representations as
characters.
</p>
        </blockquote>
      </div>
      <p class="synopsis" kind="Type"><span class="kind">Type</span><span class="ignore">: </span><a id="idm139667875912768" class="indexterm"/> <code class="function">character</code></p>
      <div class="blockquote">
        <blockquote class="blockquote">
          <p>A Unicode code point - normally a Unicode scalar value,
but could be a surrogate.
This is implemented using a 32-bit <code class="literal">int</code>.
When an object is needed (i.e. the <em class="firstterm">boxed</em> representation),
it is implemented an instance of <code class="literal">gnu.text.Char</code>.
</p>
        </blockquote>
      </div>
      <p class="synopsis" kind="Type"><span class="kind">Type</span><span class="ignore">: </span><a id="idm139667875908384" class="indexterm"/> <code class="function">character-or-eof</code></p>
      <div class="blockquote">
        <blockquote class="blockquote">
          <p>A <code class="literal">character</code> or the specical <code class="literal">#!eof</code> value (used to indicate
end-of-file when reading from a port).
This is implemented using a 32-bit <code class="literal">int</code>,
where the value -1 indicates end-of-file.
When an object is needed, it is implemented an instance of
<code class="literal">gnu.text.Char</code> or the special <code class="literal">#!eof</code> object.
</p>
        </blockquote>
      </div>
      <p class="synopsis" kind="Type"><span class="kind">Type</span><span class="ignore">: </span><a id="idm139667875903088" class="indexterm"/> <code class="function">char</code></p>
      <div class="blockquote">
        <blockquote class="blockquote">
          <p>A UTF-16 code unit.  Same as Java primitive <code class="literal">char</code> type.
Considered to be a sub-type of <code class="literal">character</code>.
When an object is needed, it is implemented as an instance
of <code class="literal">java.lang.Character</code>.  Note the unfortunate inconsistency
(for historical reasons) of <code class="literal">char</code> boxed as <code class="literal">Character</code>
vs <code class="literal">character</code> boxed as <code class="literal">Char</code>.
</p>
        </blockquote>
      </div>
      <p>Characters are written using the notation
<code class="literal">#\</code><em class="replaceable"><code>character</code></em> (which stands for the given <em class="replaceable"><code>character</code></em>;
<code class="literal">#\x</code><em class="replaceable"><code>hex-scalar-value</code></em> (the character whose scalar value
is the given hex integer);
or <code class="literal">#\</code><em class="replaceable"><code>character-name</code></em> (a character with a given name):
</p>
      <div class="literallayout">
        <p><a id="idm139667875893440" class="indexterm"/><span id="meta-character"/><em class="replaceable"><code>character</code></em> <code class="literal">::=</code> <code class="literal"><span class="bold"><strong>#\</strong></span></code><em class="replaceable"><code>any-character</code></em><br/>
        | <code class="literal"><span class="bold"><strong>#\</strong></span></code> <em class="replaceable"><code>character-name</code></em><br/>
        | <code class="literal"><span class="bold"><strong>#\x</strong></span></code> <a class="link" href="Lexical-syntax.xhtml#meta-hex-scalar-value"><em class="replaceable"><code>hex-scalar-value</code></em></a><br/>
        | <code class="literal"><span class="bold"><strong>#\X</strong></span></code> <a class="link" href="Lexical-syntax.xhtml#meta-hex-scalar-value"><em class="replaceable"><code>hex-scalar-value</code></em></a><br/>
</p>
      </div>
      <p>The following <em class="replaceable"><code>character-name</code></em> forms are recognized:
</p>
      <div class="variablelist" epub:type="list">
        <dl class="variablelist">
          <dt class="term"><code class="literal"><span class="bold"><strong>#\alarm</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\x0007</code> - the alarm (bell) character
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\backspace</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\x0008</code>
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\delete</strong></span></code>
</dt>
          <dd/>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\del</strong></span></code>
</dt>
          <dd/>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\rubout</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\x007f</code> - the delete or rubout character
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\escape</strong></span></code>
</dt>
          <dd/>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\esc</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\x001b</code>
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\newline</strong></span></code>
</dt>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\linefeed</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\x001a</code> - the linefeed character
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\null</strong></span></code>
</dt>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\nul</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\x0000</code> - the null character
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\page</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\000c</code> - the formfeed character
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\return</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\000d</code> - the carriage return character
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\space</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\x0020</code> - the preferred way to write a space
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\tab</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\x0009</code> - the tab character
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\vtab</strong></span></code>
</dt>
          <dd>
            <p><code class="literal">#\x000b</code> - the vertical tabulation character
</p>
          </dd>
          <dt class="term"><code class="literal"><span class="bold"><strong>#\ignorable-char</strong></span></code>
</dt>
          <dd>
            <p>A special <code class="literal">character</code> value, but it is not a Unicode code point.
It is a special value returned when an index refers to the second
<code class="literal">char</code> (code point) of a surrogate pair, and which should be ignored.
(When writing a <code class="literal">character</code> to a string or file,
it will be written as one or two <code class="literal">char</code> values.
The exception is <code class="literal">#\ignorable-char</code>, for which zero 
<code class="literal">char</code> values are written.)
</p>
          </dd>
        </dl>
      </div>
      <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667875855616" class="indexterm"/> <code class="function">char?</code> <em class="replaceable"><code><em class="replaceable"><code>obj</code></em></code></em></p>
      <div class="blockquote">
        <blockquote class="blockquote">
          <p>Return <code class="literal">#t</code> if <em class="replaceable"><code>obj</code></em> is a character, <code class="literal">#f</code> otherwise.
(The <em class="replaceable"><code>obj</code></em> can be any character, not just a 16-bit <code class="literal">char</code>.)
</p>
        </blockquote>
      </div>
      <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667875850032" class="indexterm"/> <code class="function">char-&gt;integer</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em></p>
      <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667875847072" class="indexterm"/> <code class="function">integer-&gt;char</code> <em class="replaceable"><code><em class="replaceable"><code>sv</code></em></code></em></p>
      <div class="blockquote">
        <blockquote class="blockquote">
          <p><em class="replaceable"><code>sv</code></em> should be a Unicode scalar value, i.e., a non–negative exact
integer object in <code class="literal">[0, #xD7FF] union [#xE000, #x10FFFF]</code>.
(Kawa also allows values in the surrogate range.)
</p>
          <p>Given a character, <code class="literal">char-&gt;integer</code> returns its Unicode scalar value
as an exact integer object.  For a Unicode scalar value <em class="replaceable"><code>sv</code></em>,
<code class="literal">integer-&gt;char</code> returns its associated character.
</p>
          <pre class="screen">(integer-&gt;char 32)                     ⇒ #\space
(char-&gt;integer (integer-&gt;char 5000))   ⇒ 5000
(integer-&gt;char #\xD800)                ⇒ throws ClassCastException
</pre>
          <p><span class="emphasis"><em>Performance note:</em></span> A call to <code class="literal">char-&gt;integer</code> is compiled as
casting the argument to a <code class="literal">character</code>, and then re-interpreting
that value as an <code class="literal">int</code>.
A call to <code class="literal">integer-&gt;char</code> is compiled as
casting the argument to an <code class="literal">int</code>, and then re-interpreting
that value as an <code class="literal">character</code>.
If the argument is the right type, no code is emitted: the value is
just re-interpreted as the result type.
</p>
        </blockquote>
      </div>
      <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667875836784" class="indexterm"/> <code class="function">char=?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
      <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667875831168" class="indexterm"/> <code class="function">char&lt;?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
      <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667875825552" class="indexterm"/> <code class="function">char&gt;?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
      <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667875819936" class="indexterm"/> <code class="function">char&lt;=?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
      <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667875814320" class="indexterm"/> <code class="function">char&gt;=?</code> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>1</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>2</sub></code></em> <em class="replaceable"><code><em class="replaceable"><code>char</code></em></code></em><em class="replaceable"><code><sub>3</sub></code></em> <em class="replaceable"><code>…</code></em></p>
      <div class="blockquote">
        <blockquote class="blockquote">
          <p>These procedures impose a total ordering on the set of characters
according to their Unicode scalar values.
</p>
          <pre class="screen">(char&lt;? #\z #\ß)      ⇒ #t
(char&lt;? #\z #\Z)      ⇒ #f
</pre>
          <p><span class="emphasis"><em>Performance note:</em></span>  This is compiled as if converting each
argument using <code class="literal">char-&gt;integer</code> (which requires no code)
and the using the corresponing <code class="literal">int</code> comparison.
</p>
        </blockquote>
      </div>
      <p class="synopsis" kind="Procedure"><span class="kind">Procedure</span><span class="ignore">: </span><a id="idm139667875805888" class="indexterm"/> <code class="function">digit-value</code> <em class="replaceable"><code>char</code></em></p>
      <div class="blockquote">
        <blockquote class="blockquote">
          <p>This procedure returns the numeric value (0 to 9) of its
argument if it is a numeric digit (that is, if <code class="literal">char-numeric?</code>
returns <code class="literal">#t</code>), or <code class="literal">#f</code> on any other character.
</p>
          <pre class="screen">(digit-value #\3)        ⇒ 3
(digit-value #\x0664)    ⇒ 4
(digit-value #\x0AE6)    ⇒ 0
(digit-value #\x0EA6)    ⇒ #f
</pre>
        </blockquote>
      </div>
    </section>
    <footer>
      <div class="navfooter">
        <p>
          Up: <a accesskey="u" href="Characters-and-text.xhtml">Characters and text</a></p>
        <p>
        Next: <a accesskey="n" href="Character-sets.xhtml">Character sets</a></p>
      </div>
    </footer>
  </body>
</html>
